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Abstract — The scaled complex Wishart distribution is 
a widely used model for multilook full polarimetric SAR 
data whose adequacy has been attested in the literature. 
Classification, segmentation, and image analysis techniques 
which depend on this model have been devised, and many 
of them employ some type of dissimilarity measure. In this 
paper we derive analytic expressions for four stochastic 
distances between relaxed scaled complex Wishart dis- 
tributions in their most general form and in important 
particular cases. Using these distances, inequalities are 
obtained which lead to new ways of deriving the Bartlett 
and revised Wishart distances. The expressiveness of the 
four analytic distances is assessed with respect to the 
variation of parameters. Such distances are then used for 
deriving new tests statistics, which are proved to have 
asymptotic chi-square distribution. Adopting the test size 
as a comparison criterion, a sensitivity study is performed 
by means of Monte Carlo experiments suggesting that 
the Bhattacharyya statistic outperforms all the others. 
The power of the tests is also assessed. Applications to 
actual data illustrate the discrimination and homogeneity 
identification capabilities of these distances. 

Index Terms — statistics, image analysis, information the- 
ory, polarimetric radar, contrast measures. 

I. INTRODUCTION 

POLARIMETRIC Synthetic Aperture Radar (Pol- 
SAR) devices transmit orthogonally polarized 
pulses towards a target, and the returned echo is recorded 
with respect to each polarization. Such remote sensing 
apparatus provides the means for a better capture of 
scene information when compared to its univariate coun- 
terpart, namely the conventional SAR technology, and 
complementary information with respect to other remote 
sensing modalities ||T|, |[2|. 

PolSAR can achieve high spatial resolution due to its 
coherent processing of the returned echoes [3|. Being 

This work was supported by CNPq, Fapeal and FACEPE, Brazil. 

A. C. Frery is with the Instituto de Computa9ao, Universidade 
Federal de Alagoas, BR 104 Norte km 97, 57072-970, Maceio, AL, 
Brazil, email: acfrery@gmaiLcom 

A. D. C. Nascimento and R. J. Cintra are with the De- 
partamento de Estatistica, Universidade Federal de Pernambuco, 
Cidade Universitaria, 50740-540, Recife, PE, Brazil, e-mails: 
abraao.susej@gmail.com and rjdsc@de.ufpe.br 



multichanneled by design, PolSAR also allows individ- 
ual characterization of the targets in various channels. 
Moreover, it enables the identification of covariance 
structures among channels. 

Resulting images from coherent systems are prone to 
a particular interference pattern called speckle [3|. This 
phenomenon can seriously affect the interpretation of 
PolSAR imagery ||2|. Thus, specialized signal analysis 
techniques are usually required. 

Segmentation ||4J, classification |[5|, boundary detec- 
tion |[6|, ||7|, and change detection |[8| techniques often 
employ dissimilarity measures for data discrimination. 
Such measures have been used to quantify the difference 
between image regions, and are often called 'contrast 
measures'. The analytical derivation of contrast measures 
and their properties is an important venue for image 
understanding. Methods based on numerical integration 
have several disadvantages with respect to closed formu- 
las, such as lack of convergence of the iterative proce- 
dures, and high computational cost. Stochastic distances 
between models for PolSAR data often require dealing 
with integrals whose domain is the set of all positive 
definite Hermitian matrices. 

Goudail and Refregier f9l applied stochastic measures 
to characterize the performance of target detection and 
segmentation algorithms in PolSAR image processing. 
In that study, both Kullback-Leibler and Bhattacharyya 
distances were considered as tools for quantifying the 
dissimilarity between circular complex Gaussian distri- 
butions. The Bhattacharyya measure was reported to 
possess better contrast capabilities than the Kullback- 
Leibler measure. However, the statistical properties of 
the measures were not explicitly considered in that work. 



Erten et al. [ [T0| derived a "coherent similarity" be- 
tween PolSAR images based on the mutual information. 
Morio et al. [|TT| applied the Shannon entropy and 
Bhattacharyya distance for the characterization of polari- 
metric interferometric SAR images. They decomposed 
the Shannon entropy into the sum of three terms with 
physical meaning. 

PolSAR theory prescribes that the returned (backscat- 
tered) signal of distributed targets is adequately repre- 



sented by its complex covariance matrix. Goodman |12| 
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presents a comprehensive analysis of complex Gaussian 
models, along with the connection between the class of 
complex covariance matrices and the Wishart distribu- 
tion. Indeed, the complex scaled Wishart distribution is 
widely adopted as a statistical model for multilook full 
polarimetric data |[2|. 

Conradsen et al. |[T3| proposed a methodology based 
on the likelihood-ratio test for the discrimination of two 
Wishart distributed targets, leading to a test statistic that 
takes into account the complex covariance matrices of 
PolSAR images. In a similar fashion, hypothesis tests 
for monopolarized SAR data were proposed in [ ,14J . 

In this paper, we present analytic expressions for the 
Kullback-Leibler, Renyi (of order /3), Bhattacharyya, 
and Hellinger distances between scaled complex Wishart 
distributions in their most general form and in important 
particular cases. Frery et al |[T5| obtained analytical 
expressions for these distances, as well as for the x^ 
distance, and they show that the last one is numerically 
unstable. Therefore, in the present work tests based on 
the x^ distance were not considered. 

We also verify that those distances present scale in- 
variance with respect to their covariance matrices. Using 
such distances, we derive inequalities which depend on 
covariance matrices; two among them, obtained from 
Kullback-Leibler and Hellinger distances, provide alter- 
native forms for deriving the revised Wishart |[T6| and 
Bartlett |[5| distances, respectively. 

Besides advancing the comparison of samples by 
means of their covariance matrices, the proposed dis- 
tances are a venue for contrasting images rendered by 
different numbers of looks. 

Considering the hypothesis test methodology proposed 
by Salicru et al 1 17 1, the derived distances are multiplied 
by a coefficient which involves the sizes of two samples 
of PolSAR images. The asymptotic and finite-sample 
behavior of the resulting quantities is studied. 

In order to quantify the sensitivity of the distances, 
we perform Monte Carlo experiments in several possible 
scenarios. We illustrate the behavior of these distances 
and their associated hypothesis tests with actual data. 

This paper unfolds as follows. Section [ll| presents the 
scaled and the relaxed complex Wishart distributions 



and estimators for their parameters. Section [III| recalls 
the background of stochastic dissimilarities. Section |IV| 
presents the analytic expressions of distances between 
Wishart models, with a new way to derive the Bartlett 
and the revised Wishart distances. Section [V] illustrates 
the application of these distances in PolSAR image 



II. The complex Wishart distribution 

PolSAR sensors record intensity and relative phase 
data which can be presented as complex scattering matri- 
ces. In principle, these matrices consist of four complex 
elements S^hh, S'hv, S'vh, and Syy, where H and V refer 
to the horizontal and vertical wave polarization states, 
respectively. Under the conditions of the reciprocity the- 
orem |[T8|, (19), we have that Srv = S'vh- This scenario 
is realistic when natural targets are considered [ [T3| . 

In general, we may consider systems with p polariza- 
tion elements, which constitute a complex random vector 
denoted by: 

y = (5i 52 • • • Sp)\ (1) 

where the superscript 'f indicates vector transposition. 
In PolSAR image processing, y is often admitted to obey 
the multivariate complex circular Gaussian distribution 
with zero mean |12|, whose probability density function 
is: 



fy{y:^) 



TT^IE 



exp(-^*E ^y), 



where | • | is the determinant of a matrix or the absolute 
value of a scalar, the superscript '*' denotes the complex 
conjugate transpose of a vector, E is the covariance 
matrix of y given by 



s = nvvi 









£(5 ^2 
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and E{} is the statistical expectation operator. Besides 
being Hermitian and positive definite, the covariance 
matrix E contains all the necessary information to char- 
acterize the backscattered data under analysis |2|. 

In order to enhance the signal-to-noise ratio, L in- 
dependent and identically distributed (iid) samples are 
usually averaged in order to form the L-looks covariance 
matrix |20i|: 

1 "^ 

1=1 
where y^, z = 1, 2, . . . , L, are realizations of ([T]). Under 
the aforementioned hypotheses, Z follows a scaled com- 
plex Wishart distribution. Having E and L as parameters, 
such law is characterized by the following probability 
density function: 



fz{Z-^.L) 



|E|^r^(L) 



exp(-Ltr(E-^Z)), (2) 



where Vp{L) = ^p^p-^^'^Y{\zIv{L - i), L > p, r(-) 



discrimination. Section VI concludes the paper. 



is the gamma function, and tr(- 
This situation is denoted Z r 



is the trace operator. 

>V(E,L), and this 
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distribution satisfies E{Z} = E, which is a Hermitian 
positive definite matrix [ [20| . In practice, L is treated as a 
parameter and must be estimated. In [21 1, Anfinsen et al. 
removed the restriction L > p. The resulting distribution 
has the same form as in Q and is termed the relaxed 
Wishart distribution denoted as yV7^(I],n). This model 
accepts variations of n along the image, and will be 
assumed henceforth. 

Due to its optimal asymptotic properties, the maxi- 
mum likelihood (ML) estimation is employed to estimate 
E and n. Let {Zi, Z2, . . . , Z^} be a random sample 
of size N obeying the yV7^(I], n) distribution. If (i) it is 
assumed that the parameter n is a known quantity and 
(ii) the profile likelihood of fz is considered in terms of 
E, we establish the following estimator for E [ ,22 J : 

N 



^-^E^- 



i=l 



Deriving the profile likelihood from Q with respect to 
n we obtain: 

^ ln[/z(Z;S,n)] = p[log(n) + 1] + log ||| 



On 



p-i 



tr(S 'z)-5;^(0)(n-i), (3) 



2=0 



where ^^^^ (•) is the digamma function [23 , p. 258]. Thus, 
the solution of above nonlinear equation provides the 
ML estimator for n. Several estimation methods for n 
are discussed in [[20|. 

Fig. [T] presents a polarimetric SAR image obtained 
by an EMISAR sensor over surroundings of Foulum, 
Denmark. The informed (nominal) number of looks is 8. 
According to Skriver et al. [[24|, the area exhibits three 
types of crops: (i) winter rape (Bl), (ii) mixture of winter 
rape and winter wheat (B2), and (iii) beets (B3). Table |I| 
presents the resulting ML parameter estimates, as well as 
the sample sizes. The closest estimate of n to the nom- 
inal number of looks occurs at the most homogeneous 
scenario, i.e., with beets. Notice that two out of three 
ML estimates of the number of looks are higher than 
the nominal number of looks. Similar overestimation 
was also noticed by Anfinsen et al. [|20|, who explained 
this phenomenon as an effect of the specular reflection 
on ocean scenarios. In our case, winter rape and, to a 
lesser extent, beets, appear smoother to the sensor than 
homogeneous targets. 

Fig. [2] depicts the empirical densities of data samples 
from the selected regions. Additionally, the associated 
fitted marginal densities yV7^(E,n) and yV(E,8) are 
displayed for comparison. In this case, the scaled Wishart 




Fig. 1. EMISAR image (HH channel) with selected regions from 
Foulum. 

TABLE I 

Parameter estimates on Foulum samples 



Regions 


n 


W7^ 


# pixels 


Bl 
B2 
B3 


9.216 
7.200 

8.555 


2.507x10-^ 
5.717x10"^ 
4.114x10"^° 


1131 
1265 
1155 



density collapses to a gamma density as demonstrated 
in \25^: 



fzXZi,n/af,n) = 



n"Z' 



fn—1 



cr 



2n 



r(n) 



exp(-nZ-/crf), 



for i G {HH,HV,VV}, where af is the (i,i)-th entry of 
E, and Z^ is the (i, z)-th entry of the random matrix Z. 
In order to assess the data fittings. Table |Tl| presents the 
Akaike information criterion (AIC) values and the sum 
of squares due to error (SSE) between the histogram 
fk of for Zhh, and the fitted densities fzuu,v{Zj^) with 



SSE 



# pixels . ^ 

k=l 



# pixels 



where # pixels denote the number of considered pixels. 
This measure was used in |26|. In all cases, the Wn dis- 
tribution presented the best fit for both measures. Table [ll| 
also shows the Kolmogorov-Smirnov (KS) statistic and 
its p- value. It is consistent with the other results, i.e., the 
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scaled Wishart distribution provides better descriptions 
of the data. 

The most accurate fit is in region B2. The equivalent 
number of looks in this region is slightly smaller than the 
nominal one, as expected. These samples will be used 



to validate our proposed methods in Section IV-D 





(a)Bl 



Z'(xl0-2) 

(b) B2 




(c)B3 

Fig. 2. Histograms and empirical relaxed (solid curve) and original 
(dashed curve) densities of samples. 



III. STOCHASTIC DISSIMILARITIES 

In the following we adhere to the convention that 
a "divergence" is any non-negative function between 
two probability measures which obeys the identity of 
definiteness property |27, ch. 11, p. 328]. If the function 
is also symmetric, it is called a "distance". Finally, we 
understand "metric" as a distance which also satisfies the 



triangular inequality ||28J ch. 1 and 14]. 

An image can be understood as a set of regions, 
in which the enclosed pixels are observations of ran- 
dom variables following a certain distribution. Therefore, 
stochastic dissimilarity measures can be used as image 
processing tools, since they may be able to assess the dif- 
ference between the distributions that describe different 
image areas [ [T4| . Dissimilarity measures were submitted 
to a systematic and comprehensive treatment in [[29|, 



pO[ and, as a result, the class of (h, 0) -divergences was 
proposed [ [T7| . 

Assume that X and Y are random matrices associated 
with densities fx{Z]6\) and fY{Z;02), respectively. 



where Oi and O2 are parameter vectors. The densities 
are assumed to share a common support yA: the cone 
of Hermitian positive definite matrices f3T\. The (h^cj))- 
divergence between fx and fy is defined by 



d^4^,y).h{1 



fx{Z;ei] 

'^"-yfYiz-e^) 



fYiz-e2)dz 

(4) 
Strictly increasing 
[0, 00) is a convex 



where h: (0,oo) -^ [0,oo) is 
function with /i(0) = 0, 0: (0, oc) 
function, and indeterminate forms are assigned the value 
zero (we assume the conventions (i) 0(0) = lima^^o f(x), 
(ii) 0(0/0) = 0, and, for a > 0, (iii) 0(j){a/0) = 
lim4o e(j){a/e) = a lim^^^oo 0(x)/x) [32, pp. 31]. In 
particular, Ali and Silvey (29) proposed a detailed dis- 
cussion about the function 0. The differential element 
dZ is given by 

p 
dZ = dZiidZ22 • • • dZpp Y[ d^{Zij}dQ{Zij}, 



where Zij is the (z, i)-th entry of matrix Z, and operators 
3?{-} and Q^{} return real and imaginary parts of their 
arguments, respectively [12]. 

Well-known divergences arise after adequate choices 
of h and 0. Among them, we examined the follow- 
ing: (i) Kullback-Leibler [[331, (ii) Renyi, (iii) Bhat- 
tacharyya [ [34) , and (iv) Hellinger [14]. As the trian- 
gular inequality is not necessarily satisfied, not every 
divergence measure is a metric p5J . Additionally, the 
symmetry property is not followed by some of these 
divergence measures. Nevertheless, such tools are math- 
ematically appropriate for comparing the distribution of 
random variables [36]. The following expression has 
been suggested as a possible solution for this issue [[33): 



d'l{X,Y)^ 



D'}{X,Y) + D^JY,X) 



Functions rf^ : ^ x A ^ R are distances over A. 
since, for all X, 1" G ^4, the following properties hold: 

1) d^{X,Y) > (Non-negativity). 

2) dj{X,Y) = d^iY^X) (Symmetry). 

3) dl{X,Y) = O^X = Y (Definiteness). 

Table [lll| shows the functions h and (p which lead to the 
distances considered in this work. 

In the following we discuss integral expressions of 
these (/i,(/)) -distances. For simplicity, we suppress the 
explicit dependence on Z and {61^62), reminding that 
the integration is with respect to Z on A.. 
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TABLE II 
AIC, SSE, AND KS STATISTICS VALUES FOR THE HH CHANNEL WITH RESPECT TO THE RELAXED AND ORIGINAL WiSHART 

DISTRIBUTIONS 



o • AIC 

Remons 




SSE 


KS (p-value) 




W7^ 


W 


Wn W 


W7^ 


W 


Bl 
B2 
B3 


-8401.105 
-4973.743 
-13725.650 - 


-8354.817 
-4987.579 
-13734.330 


58.169 76.853 

1.245 1.487 

2362.108 2836.599 


0.070(0.325x10"^) 

0.018(0.789) 

0.063(2.383x10"^) 


0.051(0.006) 
0.034(0.110) 
0.059(6.435x10"^) 






(/i,0)- 


TABLE III 

DISTANCES AND THEIR FUNCTIONS 






(/i, 0) -distance 




Hy) 


(/)(X 


) 




Kullback-Leibler 

Renyi (order (3) 

Bhattacharyya 

Hellinger 


2//2 

^log((/3-l)2/ + l), 0<y<j^ 

-log(-2/+l),0<2/<l 
y/2,0<y<2 


(x-1)] 

13 2(13-1) 

-^/x + 


ogx 
-,0</3< 1 

cc + l 
2 

If 



dKL{X,Y) = -[DKL{X,Y) + DKL{Y,X)] 



(i) The Kullback-Leibler distance 

1 

2 
1 

2 






fv) log 



/ylog 



^1 
/x 



h 



The divergence Djcl has a close relationship with 
the Neyman-Pearson lemma p7| and its sym- 
metrization has been suggested as a correction form 
of the Akaike information criterion (33). 
(ii) The Renyi distance of order /3: 



^Pi 



1, 



1/3 



,/3^ 






2(/3 



/3 



where < /3 < 1. The divergence D'^ has 
been used for analysing geometric characteristics 



with respect to probability laws p8| . By the Fejer 
inequality p9), we have that 



4{X,Y) 



1 ..jfxf'.-' + !fx'f^ 



<di{X,Y). 

The distance d^ proves to be more algebraically 

tractable than d^ for some manipulations with the 
complex Wishart density. Thus, we use the former 
in subsequent analyses, 
(iii) The Bhattacharyya distance: 



ds{X,Y)^ -log J y^hJy 



Goudail et al pfll showed that this distance is an 
efficient tool for contrast definition in algorithms for 
image processing, 
(iv) The Hellinger distance: 



duiX,Y) = l- J y^fxfv. 



Estimation methods based on the minimization of 
du have been successfully employed in the context 
of Stochastic differential equations |41j|. This is the 
only bounded distance among the ones considered 
in this paper. 

When considering the distance between particular 
cases of the same distribution, only parameters are 
relevant. In this case, the parameters Oi and O2 replace 
the random variables X and Y as arguments of the 
discussed distances. This notation is in agreement with 
that of [17 1. 

In the following, the hypothesis test based on stochas- 



tic distances proposed by Salicru et al. p7] | is intro- 
duced. Let M-point vectors Oi = (6^11, • • • , 6^im) and 
O2 = (6^21, • • • , 6>2m) be the ML estimators of parameters 
Oi and O2 based on independent samples of sizes A^i 
and N2, respectively. Under the regularity conditions 
discussed in |[T7} p. 380] the following lemma holds: 



> A G (0, 1) and Oi 



Lemma 1: If .r^V 

O2, then 






Ni + N2h'{{))(l)"{l) N^,N,^oc 



..^ „ 



(5) 



where "^" denotes convergence in distribution and Xm 
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represents the chi-square distribution with M degrees of 
freedom. 

Based on Lemma [TJ statistical hypothesis tests for the 
null hypothesis Oi = O2 can be derived in the form of 
the following proposition. 

Proposition 1: Let A^i and N2 be large and 
5^(01,02) = s, then the null hypothesis Oi = O2 can 
be rejected at level a if Ft{xm > s) < a. 

We denote the statistics based on the Kullback-Leibler, 
Renyi, Bhattacharya, and Hellinger distances as S^kl, S^, 
Sb, and Su, respectively. 



Case (i): 



2 [ I2J2I n2 

+ p [^(°) (m - p + 1) - ^(°) (n2 - p + 1)] 

p-1 



+ (n2 - m) V 7 ^7 ^ 

^ (ni-z)(n2-z). 

tT{n2^2^T.i + niE^^E2) :p(ni + 712) 



+ 



(6) 



IV. Analytic expressions, sensitivity, 

INEQUALITIES, AND FINITE SAMPLE SIZE BEHAVIOR 



In the following, analytic expressions for the stochas- 
tic distances o^kl, %, ^b, and du between two re- 
laxed complex Wishart distributions are derived (Sec- 
IV-A| ). We examine the special cases in terms of 



tion 

the parameter values: (i) Ei ^ E2 and ni 7^ 712, which 
correspond to the most general case, (ii) same equivalent 
number of looks ni = 712 = n and different co variance 
matrices Ei ^ E2, and (iii) same covariance matrix 
El = E2 and different equivalent number of looks 
ni ^ 712. Case (ii) is likely to be the most frequently 
used in practice since it allows the comparison of two 
possibly different areas from the same image. Case (iii) 
allows the assessment of a change in distribution due 
only to multilook processing on the same area. 

The sensitivity of the tests to variations of parameters 



is qualitatively assessed and discussed in Section IV-B 



In Section [iV-C| we derive inequalities which Ei and 
E2 must obey. These inequalities lead to the Bartlett and 
revised Wishart distances in a different and simple way 
when compared to a well-known method available in 
literature |42|. Distances are also shown to satisfy scale 
invariance with respect to E. 

The performance of the tests for finite size samples is 
quantified by means of (i) Monte Carlo simulation and 



(ii) true data analysis in Section IV-D 



Details of this derivation are given in Ap- 
pendix |A| 
Case (ii): 



dKL(6>i,6>2) = n 



tr(E^^E2 + E^iEi) 



p 



This result was also derived by Lee and 
Bretschneider |43 1 and applied to real PolSAR 
data for assessing separability of target classes. 
Case (iii): 



dKL{0i,e2) 



ni - 712 



■ p\og 



7li 



2 I ~ ' 712 

+ p [^(°) (m - p + 1) - ^(°) {712 -p+i)] 

i 



p-i 



+ (^2 - ni) Y^ 



^ (ni -i){7i2 -i) 
1=1 



2) Renyi distance of order < /3 < L' 



Case (i): 



A. Analytic expressions 



1) Kullback-Leibler distance: 



d^{9i,e2) 



1 i{eu02) ,_. 

log ^ , (7) 



(3-1 
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where 



Case (iii): 



1(01,02) 



r{ni-p+l)P 



p-1 



TV 



pni 



^irHini 



r{n2-p + iy 



i=l 



p-1 



n\ 



pn2 



^2ril(n2 



i=l 



/3-1 



p-1 



TiEi^-p+m-Suf'^HiEi^-iy 



i=l 



+ 



r{ni-p+l)P 



p-1 



n 



pni 



^irY[{ni-iy 



/3-1 



T{n2-p + iy 



p-1 



n. 



pn-i 



S2rn("2 



i=l 



p-1 



V{E2i-p+lY\^2if^' J{{E2i - iy 



i=l 



diiei,e2) 



log 2 



+ 



log 



1-/3 p-1 

r P-1 -\ -13 

r{ni-p+lfn-P'''Ylini-i 

i=l 
r P-1 il3-l 

X r(n2 -p + If n^^"^ J](n2 - i)^ 

i=i 
p-1 



+ 



p-1 



/3-1 



1-/9 



T{ni-p+irn^^^^Y[{ni-iy 

i=l 
p-1 

i=l 

P-1 s 

(r(E2-p + if^2-^^^n(^2-ir . 

i=l ^ 



3) Bhattacharyya distance: 
Case (i): 



^3(^1,^2) = ^^^^^^ + "^^°^"^^ 



ni + n2 



log 



2 ■ 2 



I Vi„, \/r(ni-fc)r(n2-fc) 
- - (ni log ni + n2 log 77,2) . 



(8) 



Case (ii): 



where Eij = /Sn^ + (1 ~ /3)^j, for z, j = 1, 2, 



and E,^- = |(n,/3Eri ^ ^^.^^ _ ^^j.- 
Case (ii): 



l^-l| 



rfB(6>i,6>2)=n 



log|Ei| +log|E2| 



log 



Er^ + E-^^-^ 



Case (iii): 






ylogj 



|(/35]ri + (l-/3)I]2- 



l^-l| 



+ 



|Si|^|S2|i-^ 

\{pj::^' + {I - mi')-'\ 



I Ell (1-/5) I S2 1 ^ 



"}. 



^ ^^ ^ ^ ni + 712 ni + 712 
g?b(6^i, 6^2) = P ^ log ^ 

^V^. Vr(ni - A:)r(7i2 - A:) 

+ 2^ log - 

p 
- - (711 log 711 + ^2 log 712) . 



Y^m^-k) 



4) Hellinger distance: 
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Case (i): 



du{0i,92) = 1 



n^'rir' 



|2-i(niS^^+n2S^ 



l^-l 



Case (ii): 

rfH(6>l,6>2) = l 



l| 2 |2j2| 2 

,^0 yrim - fc)r(n2 - k) ■ 



|2-i(sr^ + S2-i)-i| 



(9) 



Case (iii): 



dH(0i,02) = l-A/nrnf 



V |^l||^2| 



^Pnivn2 



n-[ -\-nn 






Vr(m-A:)r(n2-fc)' 

^. Sensitivity analysis 

Now we examine the behavior of the statistics pre- 
sented in Lemma [T] with respect to parameter variations, 
i.e., under the alternative hypotheses. These statistics 
are directly comparable since they all have the same 
asymptotic distribution; we used Ni = N2 = 100. Two 
simple alternative hypotheses are illustrated: changes in 
an entry in the diagonal of the covariance matrix, and 
changes in the number of looks. 

Firstly, we assumed n = 8 looks, 61 = 
(E(360932),8) and 62 = (5^(x),8), where 



E(x) = 



X 



11050 + 37591 
98960 



63896 + 1581i 

6593 + 6868i 

208843 



(10) 

Since the covariance matrix is Hermitian, only the up- 
per triangle and the diagonal are displayed. The fixed 
covariance matrix E (360932) was previously analyzed 
in (61 in PolSAR data of forested areas. 

Fig. [3(a)l shows the statistics for x G [160000, 560000]. 
They present roughly the same behavior. 

Secondly, we considered fixed covariance matri- 
ces with varying equivalent number of looks: Oi = 
(E(360932), 8) and 62 = (X;(360932), m), for 3 < m < 
13. Fig. |3(b)| shows the statistics. It is noticeable that the 
test statistics are steeper to the left of the minimum. 
The number of looks, being a shape parameter, alters 
the distribution in a nonlinear fashion. Such change 
is perceived visually and by distance measures, and 
it is more intense for low values of the parameter. 



In other words, the difference between yV7^(I],n) and 
yV7^(I], fcn), for any fixed k > 1 and any E, becomes 
smaller when n increases. 
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Fig. 3. Sensitivity of statistics. 
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C. Invariance and inequalities 

The derived distances are invariant under scalings of 
the covariance matrix E. In fact, it can be shown that 

d7w[(ttS;i,ni),(aE2,n2)] = ^^^[(^i, m), (S;2, ^2)], 

where a is a positive real value and M G {KL, R, B, H}. 
This fact stems directly from the mathematical definition 
of these distances. 

De Maio and Alfano |[44| derived a new estimator 
for the covariance matrix under the complex Wishart 
model using inequalities relating the sought parameters. 
In the following we derive new inequalities for this 
model. Due to the major role of the covariance matrix 
in polarimetry [ [T3| , we limit our analysis to inequalities 
that depend on S. 

Case (ii) described in previous subsections paved the 
way for the new inequalities. The following results stem 
from the nonnegativity of the four distances: 



tr(E^^Ei + S^^Es) > 2p, 



(11) 



l\-l|n 



(b^j l^2|-"|(/35:r^ + (1-/3)5:2- 

+ f E4V |5:i|-|(/3I]2-i + (1 - /3)5]r^)-ir < 2, 



log|Si| +log|S2| 



>log 



sr^ + s,-^^-^ 



and 



\/|Si||S2|> 



E^^ + S,-l^-^ 



(12) 
, (13) 

(14) 
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respectively. Fixing /3 = 1/2 in ( [T2| ), we obtain ( [141 ) 
directly; taking the logarithm of both sides of ([14]) 
yields ( [T3] ). This result is justified by the following two 
relations: 

1) (1^(01,02) = 4/\ei, 02) ^nd 

2) duiOi, 02) = 1 - exp (-^4/'(0i, 02)). 

The revised Wishart [ [T6| and Bartlett distances |[5| 
can be obtained in a new and simple manner. Indeed, 
the revised Wishart distance (^rw) can be derived after 
simple manipulations of inequality ( [TT] ), yielding: 



converge in probability to 10, as a consequence of the 

notice that S 



IV 



dRwi^i, ^2) 



-p>0. 



The Bartlett distance arises after taking the logarithm 
of both sides of inequality ([14]). Straightforward algebra 
leads to: 



In 



XI1 SI9 



-2pln2 > 0. 



The leftmost term in the inequality above is referred to 
as the Bartlett distance [|5j|, \l6j. 

D. Finite sample size behavior 

We assessed the influence of estimation on the size of 
the new hypothesis tests using simulated data. To that 
end, the study was conducted considering the following 
simulation parameters: number of looks n = ni = n2 G 
{4, 8, 16} and the forest covariance matrix shown in ( [TO] ) 
with X = 360932. The sample sizes relate to square 
windows of size 7x7, 11x11, and 20 x 20 pixels, i.e., 
Ni,N2 G {49,121,400}. Nominal significance levels 
a G {1%, 5%} were verified. 

Let T be the number of Monte Carlo replicas and 
R the number of cases for which the null hypothesis 
is rejected at nominal level a. The empirical test size 
is given by ai-a = R/T . Following the methodology 
described in [14], we employed T = 5500 replicas. 



Table [IV] presents the empirical test sizes at 1% and 
5% nominal levels, the execution time in milliseconds, 
the test statistic mean (S), and coefficient of variation 
(CV). All numerical calculations and the execution time 
quantification were performed, running on a PC with an 
Intel Core 2 Duo processor 2.10 GHz, 4 GB of RAM, 
Windows XP, and the R platform v. 2.8.1. For each case, 
the best obtained empirical sizes and distance means are 
in boldface. Results for Ni = 49,7V2 = {121,400} and 
Ni = 121, A^2 = 400 are consistent with the ones shown, 
and are omitted for brevity. 

We tested ten parameters: nine related to the covari- 
ance matrix of order p = 3, and the number of looks L, 
leading to test statistics which asymptotically follow Xio 
distributions. Thus, the statistics expected value should 



weak law of large numbers. In Table 
tends to 10 as the sample size increases. By fixing the 
sample size while varying the number of looks n, test 
sizes obey the inequalities Su < Sb < S^ < S^kl, as 
illustrated in Fig. [3] These inequalities suggest that, for 
this study, the statistics based on the Kullback-Leibler 
distance is the best discrimination measure. 

Regarding execution times, the KuUback-Leibler- 
based test presented the best performance, while the 
test based on the Hellinger distance showed the best 
empirical test size in 6 out of 18 cases. 

The presented methodology for assessing test sizes 
was also applied to the three forest samples from the E- 
SAR image shown in Fig. [T] Each sample was submitted 
to the following procedure [ |14J : 

(i) split the sample in disjoint blocks of size A^i; 

(ii) for each block from (i), split the remaining sample 
in disjoint blocks of size A^2; 

(iii) perform the hypothesis test as described in Propo- 
sition [T] for each pair of samples with sizes A^i and 
A^2. 

Table |V] presents the results, omitting some entries 
as in Table [IV] All test sizes were smaller than the 
nominal level, i.e., the proposed tests do not reject the 
null hypothesis when similar samples are considered. 

We also made the following study on the 
tests power: In each one of T Monte Carlo 
experiments, random matrices both of sizes 
N G {9,16,25,36,49,64,81,100,121,144} were 
sampled from the >V7^(S;(360932), 4) and from the 
>V7^(X;(360932) • (1 + 0.2), 4) distributions. The 
covariance matrix T,{x) is given in ( [TO] ), and the 
experiment consists in contrasting samples from 
the relaxed Wishart distribution it indexes, and the 
law indexed by a version scaled by 1.2, arbitrarily 
chosen. Subsequently, it was verified whether these 
samples come from similar populations according to 
Proposition [T| Let R be the number of situations for 
which the null hypothesis is rejected at nominal level 
a; the empirical test power is given by R/T. Fig. [4] 
presents these estimates for the test power. Notice that 
the discrimination ability is about the same for all tests 
above N = 49. 

In general terms, the proposed hypothesis tests pre- 
sented good results regarding their power even for small 
samples: with samples of size 49, they are able to 
discriminate between covariance matrices which are only 
20% different in about 80% of the time. As the sample 
size increases, all the tests discriminate better and better. 
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TABLE IV 
EMPIRICAL SIZES FOR Bi 














F 


actors 

A^i N2 

49 49 
121 121 
400 400 






n = 4 










n = 8 










n = 16 






^(p 


1% 


5% 


time (ms 


) s 


CV 


1% 


5% 


time (ms 


) s 


CV 


1% 


5% 


time (ms 


) s 


CV 


Skl 


1.309 
0.818 

1.055 


5.491 
4.545 
4.836 


0.44 
0.43 
0.42 


10.189 
9.843 
9.950 


44.804 
44.743 

45.272 


1.472 
1.255 
1.055 


6.291 
5.618 

5.327 


0.49 
0.49 
0.50 


10.292 
10.052 
10.073 


45.873 
45.241 
44.895 


1.509 
1.000 

1.036 


6.364 
4.836 
4.655 


0.39 
0.48 
0.46 


10.400 

9.982 

10.051 


45.828 
44.309 
44.539 


0/3 


49 49 
121 121 
400 400 


1.255 
0.800 
1.036 


5.309 
4.473 
4.836 


0.54 

0.57 
0.57 


10.157 
9.830 
9.946 


44.658 
44.687 

45.255 


1.436 
1.236 
1.054 


6.127 

5.582 
5.327 


0.51 
0.61 

0.57 


10.272 
10.044 
10.071 


45.782 
45.203 
44.885 


1.473 
0.964 
1.036 


6.291 
4.800 
4.655 


0.58 
0.56 
0.65 


10.385 

9.976 

10.049 


45.763 
44.286 
44.531 


^B 


49 49 
121 121 
400 400 


1.164 

0.782 
1.036 


5.055 

4.418 
4.836 


1.10 
1.11 
0.99 


10.101 
9.809 
9.939 


44.408 
44.588 
45.224 


1.418 
1.218 
1.055 


5.873 
5.473 
5.323 


1.12 
0.99 
1.13 


10.235 
10.030 
10.066 


45.624 
45.135 
44.866 


1.436 
0.963 
1.036 


6.164 
4.745 
4.636 


1.07 
1.12 
1.20 


10.358 

9.967 

10.046 


45.650 
44.245 
44.515 


^H 


49 49 
121 121 
400 400 


0.655 
0.618 
1.036 


3.891 
4.018 
4.691 


1.10 
1.11 
0.99 


9.797 
9.691 
9.902 


43.017 
44.054 
45.056 


0.891 
0.927 
0.909 


4.400 
5.018 
5.145 


1.12 
0.99 
1.13 


9.920 

9.906 

10.029 


44.184 
44.580 
44.698 


1.000 

0.855 
1.018 


4.782 

4.091 
4.473 


1.07 
1.12 
1.20 


10.035 

9.845 

10.009 


44.225 
43.705 
44.348 



TABLE V 

Empirical sizes for forests 



Factors 






Bi 




s^ m N2 


1% 


5% 


S (xlQ- 


-^) CV 


Skl 49 49 
121 121 


0.00 
0.00 


0.00 
0.00 


59.80 

38.27 


157.36 
100.85 


Sl 49 49 
121 121 


0.00 
0.00 


0.00 
0.00 


40.77 
26.56 


153.99 
100.41 


Sb 49 49 
121 121 


0.00 
0.00 


0.00 
0.00 


41.92 
28.05 


148.92 
99.66 



Sn 49 49 0.00 0.00 
121 121 0.00 0.00 







B 


2 




1% 


5% 


s 


(xlQ- 


-^) CV 


0.00 
0.00 


0.00 
0.00 




45.16 
19.76 


64.18 
61.66 


0.00 
0.00 


0.00 
0.00 




31.40 
13.79 


63.83 
61.51 


0.00 
0.00 


0.00 
0.00 




33.25 
14.70 


63.22 
61.23 







B3 




1% 


5% 


S (xlQ- 


-^) CV 


0.00 
0.00 


0.00 
0.00 


47.05 
29.54 


87.32 
87.72 


0.00 
0.00 


0.00 
0.00 


32.61 
20.54 


86.42 
87.17 


0.00 
0.00 


0.00 
0.00 


34.34 
21.77 


84.96 
86.24 



36.51 


132.59 


0.00 


0.00 


31.61 


60.27 


0.00 


0.00 


32.22 


79.05 


26.41 


96.60 


0.00 


0.00 


14.38 


59.97 


0.00 


0.00 


20.89 


82.53 



O 






w 




Sample size 
Fig. 4. Empirical test power in semilogarithmic scale. 



V. Applications 



This section presents two applications of the tests 
based on stochastic distances. Firstly, a discrimination 
analysis was performed in order to assess the influence of 
image texture on the tests. It is known that the complex 
Wishart distribution is more appropriated for describing 
homogeneous regions. However, other polarimetric dis- 
tributions, potentially more apt to describing textured 
areas, yield intractable expressions which depend on 
special functions, such as the hypergeometric and mod- 
ified Bessel functions. In order to quantify textures, we 
considered distances between relaxed scaled complex 
Wishart laws as proposed by Anfinsen et al. pT| . 
Secondly, stochastic distances are embedded into the k- 
means method in order to identify groups in PolSAR 
data. The performance of four distances was assessed by 
means of a synthetic image generated from the relaxed 
scaled complex Wishart distribution. 
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A. Discrimination analysis 

AIRSAR was an airborne mission with PolSAR capa- 
bilities, designed and built by the Jet Propulsion Labora- 
tory, which operated at P-, L-, and C-bands [|45|. Fig. [5] 
shows a 550 x 645 pixels image (HH channel) of San 
Francisco recorded by this sensor, acquired with four 
nominal looks. Nine areas were chosen to represent three 
different degrees of roughness: homogeneous, heteroge- 
neous, and extremely heterogeneous, labeled as A^, B^, 
and Ci, respectively, for z = 1, 2, 3,. 




Fig. 5. AIRSAR HH data and samples 

The parameters of the complex Wishart distribution 
were estimated by maximum likelihood, cf. Eq. ([3]). 



Table |VI| presents the estimated number of looks and 
determinants of the complex covariance matrices for 
each area, along with the number of observations. 

Goodman (22) studied the distribution of the determi- 
nant of the complex covariance matrix, which is clasi- 
cally understood as a generalized variance. In PolSAR, 
this quantity is related to the speckle variability, defined 
as the effect of the speckle noise resulting from multipath 
interference. Additionally, when there is variability due 
to texture it is caused by the spatial variability of the 
reflectance, and it is understood as "heterogeneity" or 
"roughness". This source of variability can be captured 
by, for instance, the roughness parameter of the polari- 
metric G^ law ||46|, ||47). 

We observed that the elements of the covariance matri- 
ces become larger along with the determinant when the 
heterogeneity increases. The most homogeneous region, 
Ai, has the covariance matrix with smallest determinant. 
Sample Bi has the largest determinant among heteroge- 
neous regions. This suggests that this last sample is the 
most heterogeneous, even with the presence of a double 
bounce |48| in sample B3. Urban areas (labeled Ci, C2 
and C3 in Fig. [5]), which are extremely heterogeneous 
targets, lead to the largest determinants. Additionally, 



the estimated number of looks decreases with the het- 
erogeneity. 

TABLE VI 

ESTIMATED NUMBER OF LOOKS AND GENERALIZED VARIANCE 





Region 

A. 

B. 


Subscript of regions 


Estimates 


i= 1 


i = 2 


i = 3 


n 
# pixels 


4.04 

3.24 X 10"^ 

15960 


3.75 

59.71 X 10"^ 

10339 


3.98 

18.83 X 10"^ 

11449 




3.05 

35.86 X 10"^ 

17385 


3.21 

7.38 X 10"^ 

9152 


3.15 

10.87 X 10"^ 

5499 




3.03 

1.97 X 10-^ 

20320 


3.12 

1.46 X 10-^ 

8034 


3.09 

1.17 X 10-^ 

13770 



Stochastic distances were computed between pairs 
of these estimated distributions. Table IVIII shows the 
distances between regions of the same class. In all but 
one case, the values were found to be ordered as follows: 
dKL > ^R^ ^ dB > du > d^^. The only discrepancy 
occurs when comparing homogeneous regions Ai and 
A2, where the last inequality is not preserved. 

TABLE VII 

Distances between regions of similar roughness 



Reg 


ions 


dKL 


dr 


^B 


^H 


d^ 


A1-A2 


19.83 


13.13 


2.66 


0.93 


1.46 


A1-A3 


7.34 


5.91 


1.41 


0.76 


0.66 


A2 


-A3 


2.01 


1.73 


0.45 


0.36 


0.19 


Bi 


-B2 


1.83 


1.58 


0.41 


0.34 


0.18 


Bi 


-Bs 


1.11 


0.97 


0.26 


0.23 


0.11 


B2 


-Bs 


0.26 


0.23 


0.06 


0.06 


0.03 


Ci 


-C2 


0.35 


0.31 


0.09 


0.08 


0.03 


Ci 


-Cs 


0.21 


0.18 


0.05 


0.05 


0.02 


C2 


-C3 


0.19 


0.17 


0.05 


0.05 


0.02 



Table |VIII| presents the distances between regions of 
different roughness. Similarly to the univariate case p4| , 
in these cases regions become more distinguishable. In 
all cases, the distances satisfy 

if \\^k\ — \^i\\ < \\^k\ — \^m\\' 

As expected, distances between samples of different 
classes are much larger than those between samples with 
similar roughness. 

B. Clustering with stochastic distances 

A common characteristic of segmentation and classi- 
fication algorithms is their sensitivity to the dissimilarity 
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TABLE VIII 
DISTANCES BETWEEN REGIONS OF DIFFERENT ROUGHNESS 



Regions 


dKL 


dr 


^B 


^H 


dV 


Ai-Bi 


621.16 


91.09 


12.38 


1.00 


10.12 


A1-B2 


318.09 


74.13 


10.61 


1.00 


8.24 


A1-B3 


397.15 


79.29 


11.08 


1.00 


8.81 


A2-B1 


222.93 


60.04 


8.72 


1.00 


6.67 


A2-B2 


119.27 


46.15 


7.23 


1.00 


5.13 


A2-B3 


152.90 


51.65 


7.81 


1.00 


5.74 


A3-B1 


361.81 


72.51 


10.14 


1.00 


8.06 


A3-B2 


185.07 


57.19 


8.50 


1.00 


6.35 


A3-B3 


239.22 


62.80 


9.05 


1.00 


6.98 


Ai-Ci 


1559.32 


110.64 


14.65 


1.00 


12.29 


A1-C2 


1469.62 


109.18 


14.57 


1.00 


12.13 


A1-C3 


1393.29 


106.23 


14.16 


1.00 


11.80 


A2-C1 


621.61 


77.63 


10.70 


1.00 


8.62 


A2-C2 


575.70 


75.85 


10.56 


1.00 


8.43 


A2-C3 


554.52 


74.31 


10.29 


1.00 


8.26 


A3-C1 


1010.30 


90.89 


12.26 


1.00 


10.10 


A3-C2 


954.88 


89.29 


12.14 


1.00 


9.92 


A3-C3 


907.62 


87.21 


11.81 


1.00 


9.69 


Bi-Ci 


4.76 


3.91 


0.95 


0.61 


0.43 


B1-C2 


4.23 


3.54 


0.88 


0.59 


0.39 


B1-C3 


3.88 


3.25 


0.81 


0.55 


0.36 


B2-C1 


13.41 


9.50 


2.02 


0.87 


1.06 


B2-C2 


12.44 


8.97 


1.93 


0.86 


1.00 


B2-C3 


11.18 


8.10 


1.75 


0.83 


0.90 


B3-C1 


9.42 


7.24 


1.65 


0.81 


0.80 


B3-C2 


8.71 


6.79 


1.57 


0.79 


0.75 


B3-C3 


7.59 


5.96 


1.38 


0.75 


0.66 



measure they employ |T6| , [ [49| . As already presented, 
Stochastic distances present good discriminatory proper- 
ties and, therefore, can be used for identifying clusters 
in PolSAR data. To that end, in the following we use the 
fc-means method with these measures applied to, firstly, 
synthetic data, and, secondly, to a PolSAR image. 

Consider N observed covariance matrices Z^, 1 < 
i < N, and assume that each observation belongs to a 
class Hi, 1 < i < k, with k known. Each class can be 
characterized by an unknown centroid C^, 1 < i < k, 
and the task is assigning each observation to a single 
class. Algorithm [T] performs this task using stochastic 
distances as dissimilarity criteria. 

Fig. |6(a)| presents a simulated PolSAR image of 
75 X 80 pixels with ten regions generated from the scaled 
complex Wishart law and 12 looks. The regions have 
low, intermediate, and high brightness. The intermediate 
brightness region has covariance matrix given in ([10]), 
while the largest and smallest brightnesses regions are 
simulated from the following matrices, respectively: 



962892 19171 - 3579i -154638 + 1913881 
56707 -5798 + 168121 

472251 



Algorithm 1 fc-means using stochastic distances 
1: Choose a set of arbitrary initial k centroids C = 

{Ci,C2, . . . ,Cfc}. 
2: For each i G {1,2,..., k}, set the cluster JF^ as a 
set of pixels which are closer to Ci than to Cj (for 
all i ^ j) according to the rule 

:F, = {Z,:d^([Z,n],[C„n]) < 

min dM{[Z,n],[Cj,n])}, 

where dM is a stochastic distance. 
3: Reset Ci as the sample mean of the elements of 

Ti defined in the step |2| for i G {1, 2, . . . , A:}. 
4: Repeat steps |2] and |3] until C no longer changes. In 

other words, when the following measure assumes 

zero value: 



H{v) 



# pixels k 

EE 

j=l 1=1 



Ij^X^j^v) -Ij^X^j.v-l) 



for all n > 1, where Xj^^ represents the jth element 
of the vector labels associated to elements of the 
vectorization from data matrix at vi\\ iteration, Xj^o 
is jth label of Initial solution, and I:f,(-) is the 
indicator function of set Ti. 



and 



32556 



556 + 787i 


24046 - 27287i 


1647 


-146 - 482i 




61028 



These two matrices were observed in |[6|, in urban and 
pasture regions, respectively. 



Fig. |6(b)| shows the initial stage in the clustering 
process, which is quite far from the ideal solution. 
Figs.|6(c)|to|6(f)|present the results of using the fc-means 



algorithm based on stochastic distances. Notice that all 
the distances were able to identify clusters accurately 
with a few spurious spots. 

We applied this methodology to a 182 x 210 pixels area 
from the San Francisco AIRSAR image (Fig. |7(a)| ). This 
area is composed of urban and forest regions. Fig. |7(b) 
shows the initial stage of the clustering analysis, which 
was randomly generated. Notice that d^^ gathers more 
pixels of urban regions than the other distances. The 
Kullback-Leibler distance presented the worst perfor- 
mance in terms of the identification of pixels of urban 
scenarios. This may be due to the departure from the 
assumption that a region is Wishart distributed on such 
situations. 



IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 



13 







(a) Synthetic image 



(b) Initial solution 




(c) c/kl clusters 



(d) dB clusters 




(e) du clusters 



(f) d^^ clusters 



Fig. 6. Clustering a synthetic PolSAR image with /c-means and 
stochastic distances. 



VI. Conclusions 

Analytic expressions of four contrast measures be- 
tween relaxed complex Wishart distribution were derived 
for the most general case (different number of looks and 
different covariance matrices), along with the particular 
cases of same number of looks and same covariance ma- 
trix. These measures are shown to be scale invariant, and 
they lead to test statistics with asymptotic x^ distribution 
under the null hypothesis. Novel inequalities which relate 
covariance matrices and distances were derived, leading 
to a new and simple derivation of the revised Wishart and 
Bartlett distances. These new expressions can be used in 
a variety of applications as, for instance, segmentation, 
and classification. 

Those stochastic distances were successfully used as 
dissimilarities in a fc-means algorithm. Data from AIR- 
SAR sensors confirmed the expected behavior of all the 
distances: distances are smaller when applied to samples 
of similar roughness, and larger otherwise. 

All the proposed statistics based on stochastic dis- 
tances presented good performance with finite size sam- 
ples. In particular, the results provided evidence that the 




(a) AIRSAR image (b) Random initialization 




(c) c/kl clusters 



(d) dB clusters 




(e) du clusters 



(f) d^^ clusters 



Fig. 7. Clustering a PolSAR image with /c-means and stochastic 
distances. 



test based on the de has the smallest empirical test size 
in a variety of situations. This behavior was confirmed 
with samples from a PolSAR sensor. 

We presented numerical evidence that the statistics 
based on Hellinger distance overcome the other statistics. 
Our results confirm previous studies which pointed the 
Bartlett distance (a particular case of the Hellinger 
distance for the same number of looks) as the best option 
on Wishart distributed data. Therefore, the Hellinger test 
statistics derived from the (/i, (p) class of divergences is a 
reasonable statistical method for assessing if two samples 
of polarimetric data come from the same distribution. 

It is noteworthy that the tests here considered tend 
to reject more than their nominal levels when dealing 
with small samples and small number of looks. Thus, 
a study of the influence of improved estimators (bias 
reduction by numerical and analytical approaches, and 
robust versions, for instance) for the parameters n and 
E on the performance of the proposed hypothesis tests 
is a venue for new research. 

Further research will consider models which include 
heterogeneity |l6|, [ |46| , [ [47| , robust, improved and non- 
parametric inference |[50|-|[53|, and small samples is- 
sues 
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Appendix A 
The Kullback-Leibler distance in general 

FORM 

The Kullback-Leibler distance is given by 



dKL{01.02) 



ni - 712 



[E(log|Zi|)-E(log|Z2| 



E[nitr(E^^Zi) - n2 tr {T^2^ Z i)] 



+ - E [m tr(E^iZ2) - 712 tr(E^iZ2)] , 

(15) 

where Zi ^ Wni^i^^i), z = 1,2. According to 
Anfinsen et al. [ [20| , we have: 

p-i 
E(log|Z,|) = log |E,| + ^^(o)(n, - k) -plogn^ 



k=0 



p-1 



: log I E^ I + 2>^(°) {ni-p+l) + Yl 



k 



k=i 



rij — k 



-plogrii, (16) 

since tl;^^\x + 1) = tlj^^\x) + x~^, for any x real. 
Additionally, 

p p 



k=l i=l 

tr(ETiE,), ifz^j, 
p, iii^j^ 



(17) 



where Skij and Zku are the (A:, ^)-th entry of the matrices 
E J^ and Zi, respectively. Hence, applying ([16]) and (TT) 
into ([B]) yields (7). 
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